Skip to content

Conversation

@JosephTLockwood
Copy link

Adds left, right, inner, and outer joins. It also has some tiny bug fixes.

(cherry picked from commit 399777f)
(cherry picked from commit b291f05)
(cherry picked from commit 83dc405)
(cherry picked from commit 17ef1c5)
(cherry picked from commit 89a7acf)
(cherry picked from commit 7853852)
(cherry picked from commit 292cb45)
@JosephTLockwood
Copy link
Author

The more I think about the functionality of this the more I feel that it should have input similar to your filter method. Thoughts?

@paul-griffith paul-griffith self-requested a review July 6, 2023 20:58
Copy link
Member

@paul-griffith paul-griffith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be worth a larger refactor approaching this from a slightly different angle to ease testing and implementation.
What if the different join functions were regular Kotlin functions that accept the two datasets and the column args.
Then the primary unit testing effort can be focused on the plain Kotlin methods (with no need for extra Python scaffolding), and can use DSBuilder to construct test datasets more easily.
Once the actual method implementations are proven well, it's easy to write scripting-focused methods that only have to care about parsing the args and passing them to the inner functions - and since there's a lot of shared boilerplate, that can be tested as one piece as well.
So it's 4 distinct tests for the join functions, plus 1 test of Python argument parsing, instead of 4 tests with lots of overlap for argument handling. Plus, well structured, the argument handling could probably be repeated in each script function implementation, which avoids extra work if requirements for the function(s) change.

Comment on lines 92 to 93
val column = parsedArgs.requirePyObject("columnIndex").toJava<Int>()
val column2 = parsedArgs.requirePyObject("columnIndex2").toJava<Int>()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be able to just do parsedArgs.requireInteger("columnIndex") to get a primitive int back; saves some work

val listToAppend = arrayOfNulls<Any?>(combinedColumnName.size)
var row2: Int? = null

dataset2.rowIndices.forEachIndexed { rowIndex, _ ->
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe try firstNonNullOfOrNull {} here to express what you're trying to do more succintly

names = ["dataset", "columnsToSplit"],
types = [Dataset::class, Array<Array<Int>>::class],
)
fun splitter(args: Array<PyObject>, keywords: Array<String>): Array<Dataset?> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Returning as an array is a little weird - a plain list would be more typical for Python/Java/Ignition

@JosephTLockwood
Copy link
Author

I plan to revamp this. Currently, my thought is, it would be nice to have something like

system.dataset.joinOn(joinType="left",joinOn=lambda **kwargs d1, d2: d1['column1'] == d2['column2'] and d1['column1']>5

This could be a powerful tool. My current hang-up is passing two datasets into **kwargs.

@paul-griffith
Copy link
Member

paul-griffith commented Jul 19, 2023

I like that proposed syntax. You might like Phil Turmel's Simulation Aids; particularly the crazy stuff he's doing with expressions: https://forum.inductiveautomation.com/t/automation-professionals-simulation-aids-v2/74934

This could be a powerful tool. My current hang-up is passing two datasets into **kwargs.

I don't think the Kotlin function should expect the Python function to accept kwargs at all. A join is effectively a function (leftDatasetRow, rightDatasetRow) -> Boolean.
So the Kotlin side should probably be providing a tuple where each item is a dictionary columnName: valueAtRow/Col, much like system.dataset.filter/system.dataset.map do.
And then Python function would be written something like:
lambda d1, d2: d1['column1'] == d2['column2'] and d1['column1']>5

@JosephTLockwood
Copy link
Author

Thanks for pointing me back to that page. That was where I originally started my search. It appears 1 hour, and he added left join functionality 😆 https://forum.inductiveautomation.com/t/automation-professionals-simulation-aids-v2/74934/42 I'll give it a shot tomorrow. If it works, I'll leave it up to you to close this. I enjoy the learning experience of this and like how I have full customization and the fact that it currently works. However, I probably won't spend much time on it if this works, as I have a full platter and would only be able to work on it later.

@JosephTLockwood
Copy link
Author

Okay, so this should be a working left join. It was a lot simpler than I thought. I still need to clean it up and put the join type as an option, but then it should be good to go after that (I believe).

@JosephTLockwood
Copy link
Author

JosephTLockwood commented Jul 21, 2023

It should be good to go. I ended up going with column indexs instead of column names. I can change it if you want. I just though this was more elegant.

Example of Left Join:
utils.joiner(dataset, dataset2, 'left', lambda d1, d2: (d1[0] == d2[0]))

@JosephTLockwood
Copy link
Author

I plan to make a change to this. I am not sure how I overlooked this, but for some reason, I had a break after the first match was found. This makes it not a left join.

@paul-griffith
Copy link
Member

A work-in-progress refactor of what I was alluding to earlier with breaking things out to make unit testing a bit easier:
JosephTLockwood#2

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants